Optimising Sentiment Classification using Preprocessing Techniques
نویسندگان
چکیده
Sentiment Classification refers to the computational techniques for classifying whether the sentiments of text are positive or negative. Sentiment Classification being a specialized domain of text mining is expected to benefit after preprocessing. In this paper we propose various models with selective combinations of preprocessing techniques and Sentiment Classifiers, to optimize Sentiment Classification. Unlike traditional preprocessing technique where punctuation symbols are discarded, we proposed a set of rules to handle words with apostrophe and then remove punctuation symbols. Sentiment Classifiers that were proposed in our previous research articles are based on term weighting techniques. We evaluated Sentiment Classification models by comparing them with state of art techniques using the movie sentence and movie document dataset. Accuracy increased from unprocessed dataset to preprocessed data. Our Classifiers handled stopwords thus had hardly any impact of stopwords removal in preprocessing unlike traditional Sentiment Classifiers. Our classifiers also displayed accuracy better than traditional classifier and another surveyed classifier based on term weighting technique. Keywords— Sentiment Classification; Pre-processing; Term Weighting; Term Frequency; Term Presence; Document Vectors
منابع مشابه
Sentiment Analisis on Web-based Reviews using Data Mining and Support Vector Machine
This work aims to use sentiment analysis techniques, data mining, text mining and natural language processing to indicate the polarity of texts using support vector machine. Weka software and a movie review database from Internet Movie Database IMDb were used. This work uses preprocessing filters and WRAPPER techniques and Support Vector Machine (SVM) for classification. It presents better resu...
متن کاملA High-Performance Model based on Ensembles for Twitter Sentiment Classification
Background and Objectives: Twitter Sentiment Classification is one of the most popular fields in information retrieval and text mining. Millions of people of the world intensity use social networks like Twitter. It supports users to publish tweets to tell what they are thinking about topics. There are numerous web sites built on the Internet presenting Twitter. The user can enter a sentiment ta...
متن کاملOpinion Analysis on Web-based Reviews Using Support Vector Machine
This work aims to use sentiment analysis techniques, data mining, text mining and natural language processing to indicate the polarity of texts using SVM (support vector machine). Weka software and a movie review database from IMDb (internet movie database) were used. This work uses preprocessing filters and WRAPPER techniques and SVM for classification. It presents better results when compared...
متن کاملDiscrimination of Golab apple storage time using acoustic impulse response and LDA and QDA discriminant analysis techniques
ABSTRACT- Firmness is one of the most important quality indicators for apple fruits, which is highly correlated with the storage time. The acoustic impulse response technique is one of the most commonly used nondestructive detection methods for evaluating apple firmness. This paper presents a non-destructive method for classification of Iranian apple (Malus domestica Borkh. cv. Golab) according...
متن کاملA Comparison between Preprocessing Techniques for Sentiment Analysis in Twitter
In recent years, Sentiment Analysis has become one of the most interesting topics in AI research due to its promising commercial benefits. An important step in a Sentiment Analysis system for text mining is the preprocessing phase, but it is often underestimated and not extensively covered in literature. In this work, our aim is to highlight the importance of preprocessing techniques and show h...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015